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Abstract 

We consider a computing system where a master processor assigns tasks for execution to worker processors 
through the Internet. We model the workers decision of whether to comply (compute the task) or not (return a bogus 
result to save the computation cost) as a mixed extension of a strategic game among workers. That is, we assume that 
workers are rational in a game-theoretic sense, and that they randomize their strategic choice. Workers are assigned 
multiple tasks in subsequent rounds. We model the system as an infinitely repeated game of the mixed extension of 
the strategic game. In each round, the master decides stochastically whether to accept the answer of the majority or 
verify the answers received, at some cost. Incentives and/or penalties are applied to workers accordingly. 

Under the above framework, we study the conditions in which the master can reliably obtain tasks results, 
exploiting that the repeated games model captures the effect of long-term interaction. That is, workers take into 
account that their behavior in one computation will have an effect on the behavior of other workers in the future. 
Indeed, should a worker be found to deviate from some agreed strategic choice, the remaining workers would change 
their own strategy to penalize the deviator. Hence, being rational, workers do not deviate. 

We identify analytically the parameter conditions to induce a desired worker behavior, and we evaluate experi¬ 
mentally the mechanisms derived from such conditions. We also compare the performance of our mechanisms with 
a previously known multi-round mechanism based on reinforcement learning. 

Keywords: Internet Computing; Master-Worker Task Computing; Game Theory; Repeated Games; Algorithmic 
Mechanism Design 
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I. Introduction 

Motivation and prior work. The processing power of top supercomputers has reached speeds in the order of 
PetaFLOPs ffl. However, the high cost of building and maintaining such multiprocessor machines makes them 
accessible only to large companies and institutions. Given the drastic increase of the demand for high performance 
computing, Internet-based computing has emerged as a cost-effective alternative. One could categorize Internet- 
based computing into two categories: administrative computing and master-worker computing. In the first one, 
the computing elements are under the control of an administrator. Users of such infrastructure have to bear the 
cost of access, which depends on the quality of service they require. Examples of such system are Grid and 
Cloud computing. Master-working computing (also known as Desktop Grid Computing) exploits the growing use 
of personal computers and their capabilities (i.e. CPU and GPU) and their high-speed access to the Internet, in 
providing an even cheaper high performance computing alternative. In particular, personal computing devices all 
around the world are accessed through the Internet and are used for computations; these devices are called workers 
and the tasks (computation jobs) are assigned by a master entity (the one that needs the outcome of the tasks’ 
computation). At present, Internet-based master-worker computing is mostly embraced by the scientific community 
in the form of volunteer computing, where computing resources are volunteered by the public to help solve scientific 
problems. Among the most popular volunteering projects is SETI@home ||2| running on the BOINC 0 platform. 
A profit-seeking computation platform has also been developed by Amazon, called Mechanical Turk BIlH Although 
the potential is great, the use of master-worker computing is limited by the untrustworthy nature of the workers, who 
might report incorrect results 13, |l5l-||71- In SETI, the master attempts to minimize the impact of these incorrect 
results by assigning the same task to several workers and comparing their outcomes (i.e., redundant task allocation 
is employed 0). 

Prior work, building on redundant task allocation, has considered different approaches in increasing the reliability 
of master-worker computing ISl- lflTll . One such approach is to consider workers to be rational ifTSl . ifT^ in a game- 
theoretic sense, that is, each worker is selfish and decides whether to truthfully compute and return the correct result 
or return a bogus result, based on the strategy that best serves its self-interest (increases its benefit). The rationality 
assumption can conceptually be justified by the work of Shneidman and Parkes El where they reason on the 
connection of rational players (of Algorithmic Mechanism Design) and workers in realistic P2P systems. Several 
incentive-based algorithmic mechanisms have been devised, e.g., M-M, im, that employ reward/punish schemes 
to “enforce” rational workers to act correctly, and hence having the master reliably obtain correct task results. Most 
of these mechanisms are one-shot in the following sense: in a round, the master sends a task to be computed 
to a collection of workers, and the mechanism, using auditing and reward/punish schemes guarantees (with high 
probability) that the master gets the correct task result. Eor another task to be computed, the process is repeated 
but without taking advantage of the knowledge gained. 

The work in El takes advantage of the repeated interactions between the master and the workers, by studying 
the dynamics of evolution |[T9l of such master-worker computations through reinforcement learning 1201 where both 
the master and the workers adjust their strategies based on their prior interaction. The objective of the master is 

'Although in Amazon’s Mechanical Turk many tasks are performed by humans, even such cases can be seen as a computational platform, 
one where processors are indeed humans. 
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to reach a state in the computation after which it always obtains the correct results (called eventual correctness), 
while the workers attempt to increase their benefit. Roughly speaking, in each round a different task is assigned to 
the workers, and when the master collects the responses it decides with some probability to verify these answers 
or not; verification is costly to the master. Each worker decides with some probability to cheat, that is, return a 
bogus result without computing the task (to save the cost of doing so), or to be honest, that is, truthfully return the 
correct result. If the master verifies, it then penalizes the cheaters and rewards the honest workers. Also, based on 
the number of cheaters, it might decide to increase or decrease the probability of verification for the next round. 
If the master does not verify, then it accepts the result returned by the majority of the workers and rewards this 
majority (it does not penalize the minority); in this case, it does not change its probability of verifying. Similarly, 
depending on the payoff received in a given round (reward or punishment minus the cost of performing the task, if 
it performed it), each worker decides whether to increase or decrease its probability of cheating. It was shown that, 
under certain conditions, eventually workers stop cheating and the master always obtains the correct task results 
with minimal verification. 

Our approach. In this work, we take a different approach in exploiting the repeated interactions between the master 
and the rational workers. We model this repeated interaction as a repeated game ||2T]| . Unlike the work in ifTSll . as 
long as the workers operate within this framework, the master obtains the correct task results (with high probability) 
from the very first round. The main idea is the following; when the workers detect that one worker (or more) has 
deviated from an agreed strategic choice, then they change their strategy into the one that maximizes the negative 
effect they have on the utility of the deviating worker. This might negatively affect their own utility as well, but 
in long-running computations (such as master-worker computations) this punishment threat stops workers from 
deviating from the agreed strategic choice. So, indeed, workers do not deviate. (For more details on the theory of 
repeated games please refer to ll2Tll .') As we demonstrate later, under certain conditions, not only the master obtains 
the correct results, but in the long run, it does so with lower cost when compared with the repeated use of the 
one-shot mechanism of ifTSl or the reinforcement learning mechanism of ifTSl . 

Contribution. In summary, the main contributions of this work are the following: 

• To the best of our knowledge, this is the first work that attempts to increase the reliability of Internet-based 
master-worker computing by modeling the repeated interaction between the master and the workers as a 
repeated game. (The model is formalized in Section |II1) 

• We first present a mechanism (Section HIB where workers decide to cheat or be honest deterministically 
(in game-theoretic terms they follow pure equilibria strategies), and prove the conditions and the cost under 
which the master obtains the correct task result in every round (with high probability). In order to allow the 
workers to detect other workers’ deviations (from the agreed strategy), the master needs to provide only the 
number of different answers received (regardless of whether it verified or not). 

• Then, we consider the case where the workers’ decision is probabilistic (in game-theoretic terms they follow 
mixed equilibria strategies). The mechanism (Section HVli in this case is more involved as workers need more 
information in order to detect deviations. The master provides the workers which answers it has received and 
how many of each, and the workers use this information to detect deviations. We prove the conditions and 
the cost under which the master obtains the correct task result in every round (with high probability). 
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• Finally, we perform a simulation study to demonstrate the utility of our new approach (Section [V]i. The study 
complements the theoretical analysis by providing more insight on the effectiveness of the mechanisms by 
experimenting on various parameter values, and also provides comparison with the works in ifTSl and ifTSl . 
In particular, our simulations show that in the presence of \n/2\ deviators (out of n total workers), which 
is the minimum to have an impact in voting mechanisms, our mechanism performs similarly or better than 
the reinforcement learning mechanism of O], and both mechanisms perform significantly better than the 
repeated use of the one-shot mechanism of csi. 

Other related work. A classical Distributing Computing approach for increasing the reliability of master-working 
computing is to model the malfunctioning (due to a hardware or a software error) or cheating (intentional wrongdoer) 
as malicious workers that wish to hamper the computation and thus always return an incorrect result. The non-faulty 
workers are viewed as altruistic ones El that always return the correct result. Under this view, malicious-tolerant 
protocols have been considered, e.g., Ei-noi, where the master decides on the correct result based on majority 
voting. More recent works, e.g., m, ES, have combined this approach with incentive-based game theoretic 
approaches and devised mechanisms assuming the co-existence of altruistic, malicious and rational workers. The 
work in ll22l employed worker reputation to cope with malice, while the work in IH relied on statistical information 
on the distribution of the worker types (altruistic, malicious or rational). Extending our present work to cope with 
malicious workers is an interesting future direction. 

II. Model 

The master-workers framework. We consider a distributed system consisting of a master processor that assigns, over 
the Internet, computational tasks to a set W of n workers. In particular, the computation is broken into rounds. 
In each round the master sends a task to be computed to the workers and the workers return the task result. The 
master, based on the workers’ replies, must decide on the value it believes is the correct outcome of the task in 
the same round. The tasks considered in this work are assumed to have a unique solution; although such limitation 
reduces the scope of application of the presented mechanism there are plenty of computations where the 
correct solution is unique: e.g., any mathematical function. In this work security issues are not considered. Security 
can be achieved by cryptographic means, as done in BOINC EIj which allows for encrypting communication, 
authenticating master and workers, signing the code of tasks, and executing tasks in sandboxes. 

Following CD and Cll, we consider workers to be rational, that is, they are selfish in a game-theoretic sense 
and their aim is to maximize their benefit (utility) under the assumption that other workers do the same. In the 
context of this paper, a worker is honest in a round when it truthfully computes and returns the task result, and it 
cheats when it returns some incorrect value. We denote by the probability of a worker i cheating in round r. 
Note that we do not consider non-intentional errors produced by hardware or software problems. 

To “enforce” workers to be honest, the master employs, when necessary, verification and reward/punish schemes. 
The master, in a round, might decide to verify the response of the workers, at a cost. It is assumed that verifying an 
answer is more efficient than computing the task ll24l (e.g., A^P-complete problems if P f NP), but the correct 
result of the computation is not obtained if the verification fails (e.g., when all workers cheat). We denote by pv the 
probability of the master verifying the responses of the workers. The goal of the master is to accept the correct task 
result in every round, while reducing its utility; therefore, verification needs to be used only when it is necessary. 


5 


Furthermore, the master can reward and punish workers using the following scheme: When the master verifies, 
it can accurately reward and punish workers. When the master does not verify, it decides on the majority of the 
received replies, and it rewards only the majority (and it does not penalize the minority); probability is used to 
break symmetry. This is essentially the reward model TZ^ on the game 0 : n as defined in IfTSII (also considered in 
other works, e.g., m, m). 

The payoff parameters considered in this work are detailed in Table J] Observe that there are different parameters 
for the reward WBj( to a worker and the cost MCa of this reward to the master. This models the fact that the cost 
to the master might be different from the benefit for a worker. 


WPc 

worker’s penalty for being caught cheating 

WCt 

worker’s cost for computing the task 

WBa 

worker’s benefit from master’s acceptance 

MPy\; 

master’s penalty for accepting a wrong answer 

MCa 

master’s cost for accepting the worker’s answer 

MCv 

master’s cost for auditing worker’s answers 

MB-jz 

master’s benefit from accepting the right answer 


TABLE I. Payoffs. The parameters are non-negative. 


For the purposes of repeated game framework (presented next), the punishment of the master to a worker caught 
cheating is proportional to the number of cheaters: Let F denote the set of workers caught cheating in a round that 
the master verifies. Then, in that round, the master applies penalty WPq ■ jF’l to every worker in F. The fact that 
punishment is proportional to the number of cheaters is, intuitively, an important tool to implement peer-punishment, 
which is required in the repeated games framework. Hence, we carry our analysis for proportional punishments. 
The study of constant punishments is an open question that we leave for future work. 

The repeated game framework. We assume that workers participate in the system within the framework of a repeated 
game m- The objective of the repeated games model is to capture the effect of long-term interaction. That is, 
workers take into account that their behavior in one computation will have an effect on the behavior of other workers 
in the future. We further assume that workers behave according to the reality that they perceive. That is, although 
their participation is physically bounded to a finite number of rounds, it is known ll2Tll that workers participate in 
the game as if the number of repetitions is infinite, even for a small number of repetitions, until the last repetition 
is close. (Note that prediction mechanisms such as the one in ||25]| . can be used to establish the availability of a 
set of workers for a sufficiently long period of time.) In this paper, we assume that workers participate unaware of 
when their participation will end, and we analyze the system as an infinite repeated game. In the following game 
specification, we follow the notation in 1211 . 

Let the set of workers be W, and for each worker i £ FF let the set of strategies cheat and not-cheat be Ai = 
{C, C}, and the preference relation (which is the obvious preference for a higher worker utility) be Ui : A R, A = 
Then, we model each round of computation as the mixed extension G' = (FF, {Ui)iew) of 

the strategic game G = (FF, {Ai)i^w, {ui)i^w), where A(£li) is the set of probability distributions over Ai, and 
Ui : Xj^wA{Aj) ^ R. is the expected utility of worker i under Ui induced by Xj^wA{Aj). 
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On the other hand, we model the multiple-round long-running computation as an infinitely repeated game where 
the constituent game is G'. That is, an extensive game with simultaneous moves {W, H, P, (U*)i^w), where 
H = {0} U U A°°, 0 is the initial empty history, A°° is the set of infinite sequences of action 

profiles in G, the (next) player function is P{h) = W (simultaneous moves) for each history h € H, and U* is 
a preference relation on A°° that extends the preference relation Ui to infinite rounds under the limit of means 
criterion. That is, the payoff of all rounds is evaluated symmetrically, in contrast with other criteria where the value 
of a given gain may diminish with time. 

We are interested in equilibria that harness long-term objectives of the workers, rather than strategies that apply 
to short-sighted workers that would isolate each round of computation as a single game. To support equilibria of the 
infinitely repeated game that are not simple repetitions of equilibria in the constituent game, workers are deterred 
from deviating by being punished by other workers. Specifically, any deviation from an agreed equilibrium, called 
a trigger strategy, of some worker i is punished by all other workers changing their strategy to enforce the minmax 
payoff of Vi, that is, the lowest payoff that the other workers can force upon worker i. 

Any payoff profile w, where for each worker it is Wi > Vi, is called enforceable. The Nash folk theorem for the 
limit of means criterion Eli establishes that every feasiblj^ enforceable payoff profile of the constituent game is 
a Nash equilibrium payoff profile of the limit of means infinitely repeated game. Thus, any trigger strategy that 
yields an enforceable payoff profile is an equilibrium. So, we focus our effort in finding the minmax payoff to later 
analyze which of the infinitely many trigger criteria yields a mechanism that is beneficial for the master and the 
workers. For clarity, we present our mechanisms punishing the deviant indefinitely. Nevertheless, it is enough to 
held the deviant’s payoff to the minmax level for enough rounds to wipe out its gain from the deviation, as shown 
by the Perfect folk theorem Ell¬ 
in order to make punishment decisions, workers need information about previous outcomes. We consider two 
scenarios, one where the master only provides the number of different answers received, and another where the 
master informs which are the answers received, and how many of each. In the first case, workers are bounded 
to use pure equilibria of the constituent game, so that non-deviators can decide to punish if there were workers 
replying with an answer other than theirs. For the second case, a worker that did not cheat may count the number 
of answers different from its own. But a worker that cheated needs to know which one was correct. We assume 
that workers can verify the answers with negligible cost. Notice that the same does not apply to the master, which 
is assumed to have a high operation cost yielding a high cost of verification. 

III. Limited Information (pure equilibria) 

In this section we assume that workers decide to be honest or to cheat deterministically, i.e., follow pure strategies. 
Under this assumption we define the mechanism shown in Algorithmic] We will show using repeated-games analysis 
that this mechanism leads to an equilibrium where no worker cheats. The algorithm defined requires in every round 
the number of different answers received by the master in the round. This is provided by the master, as shown in 
Algorithm [T] 

^Any convex combination of payoff profiles of outcomes in A 
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Algorithm 1: Pure strategies master algorithm. The probability of verification pv is any value such that 
fTCV/( WBa + WPc \n/2\) <pv <{ WBj, + WCr)l{2 WB^ + n WPc) 

1 while true do 

2 send a computational task to all the workers in W 

3 upon receiving all answers do 

4 with probability pv, verify the answers 

5 if the answers were not verified then accept the majority 

6 reward/penalize accordingly 

7 send to the workers the number of different answers received 


Algorithm 2: Pure strategies worker algorithm. 

1 strategy •«— C 

2 while true do 


3 

upon receiving a task do 

4 


if strategy^ C then 

5 



compute the task and send the result to the master 

6 


else 

7 



do not compute and send a bogus result to the master 

8 


upon receiving from the master the number of different answers do 

9 



if number of different answers > 1 then 

10 



1 strategy t— C 


We analyze now the properties of this mechanism. Although the algorithms allow that a cheating worker replies 
with any value, in the analysis of the rest of the section we assume that all workers that cheat in a round return 
the same incorrect value (as done also, for example, in 0, ini). Intuitively, this assumption yields a worst case 
scenario (and hence analysis) for the master with respect to obtaining the correct result. 

Recall that, for any given round of the constituent game, Ui is the expected utility of worker i and Si is the 
strategy chosen by worker i. Let F be the set of cheaters and = {j\j € W A j ^ % A Sj = C}. Then, the 
following holds. 

Lemma 1: If WBji, > WCj- and WPc > WCp, for any pv such that WCp/{WBj\^ + WPc\n/2\) < pv < 
( WBj(+ WC'j-)/{2 WBj(+n WPc), and for any worker i € W, the minmax expected payoff is Vi = (1— pv) WBj\^ — 
PvnWPc, which is obtained when Si=C and iTLil > {n — l)/2. 

Proof: We notice first that, given that WB^ > WCj- and WPc > WCp, the range of values WCp / {WBji^F 
WPc\nl2\) < Pv < {WBji^ + WCp) / {2WBji, + nWPc) for pv is not empty. For any worker i, there are four 













possible utility outcomes, namely. 


= C, |F| > n/2) = (1 -pv) WBa-PvWPc\F\ (1) 

U,{s,=C,\F\>nl2)=pvWBA-WCr ( 2 ) 

U,{s,=C,\F\<nl2) = -pvWPc\F\ (3) 

17,(s, = C, |F| < n/2) = WBa - WCt (4) 


We want to find worker z’s minmax payoff, which is the lowest payoff that the other workers can force upon i 
(cf. ED). That is. 


Vi = min Ui{si,s-i). 

s.ieic.c}"-1 sie{c,c} 

From the perspective of worker i, the actions of the remaining workers fall in one of three cases; iF'-il < (n—1)/2, 
If-il = (n — l)/2, and |F_i| > (n — l)/2. Thus, we have 


Vi = min{ ma?^ Ui{si, iF'-il < (n - l)/2), ma?^ Ui{si, |F_i| 

Sie{C,C} SiGfC.C} 


(n-l)/2), 


max_ Ui{si, |F_i| > (n - l)/2)} (5) 

SiSfC.C} 


From Equations [3] and 01 


max_ 17,(si,|F_,| < (n-l)/2) = inax{-pvWPc\F\,WBA - WCr}, for 1 < |i^| < rn/2]. 

SiG{C,C} 

Given that WBa > WC^-, it is 


max_ C/,(s„ |F_,| < (n - l)/2) = WBa - WCr- (6) 

SiG{C,C} 

From Equations 0] and |2] 


maj^ Ui{si,\F_i\ > {n-l)/2) = max{{l-pv)WBA-pvWPc\F\,pvWBA-WCr}, for |"n/2] < |F| < n. 

SiG{C.C} 

Given that pv < ( WBa + WCt)/{2 WBa + n WPc), it is 

max_ t/,(s„|F_,|>(n-l)/2) = (l-pv)W"5^-pvW"f"c|f^|, for ^2) < |F| < n. (7) 

SiGfC.C} 

Erom Equations 0] and 0] 


max_ 17,(s„|F_,| = (n - l)/2) = max{(l - pv)WBa - pvWPc\n/2-\, WBa - WCt}. 

SiG{C.C} 

Given that py > WCt/[WBa + WPc\n/2'\), it is 

max_ C/,(s„ |F_,| = (n - l)/2) = WBa - WCt- (8) 

SiGfC.C} 

Replacing Equations |6] El and in Equation |D we obtain 

Vi = min{ WBa - WCt, (1 - Pv) WBa - Pv WPc\F\}, for [n/2] < |F| < n 

= (1 — pv) WBa — Pvn WPc- 
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The latter is true because pv > W^CV/( WBj( + WPc\n/2 \) > WC-r/i WB^ + n WPc)- Thus, the claim holds. 

■ 

The following theorem establishes the correctness of our pure-strategies mechanism. The proof follows from 
Lemma [1] and the repeated-games framework 11211 . 

Theorem 2: For a long-running multi-round computation system with set of workers W, and for a set of payoff 
parameters such that WBj\^ > WCx and WPc > WCq-, the mechanism defined in Algorithms [1] and |2 guarantees 
that the master always obtains the correct answer. In each round, the utility of each worker i is Ui = WB_a — WCj- 
and the expected utility of the master is Um = MB-ji — nMC^ — pvMCv. 

IV. More Information Helps (mixed equilibria) 

In this section, we consider the general case in which workers can randomize their decision to be honest of cheat. 
In this case, the equilibrium in the constituent game is mixed, that is, in the equilibrium workers cheat with some 
probability p G (0,1), rather than behaving deterministically. Hence, the actual probability used by each worker 
cannot be inferred accurately from the outcome of one computation. In other words, even knowing that some given 
worker has cheated after a computation, it is not possible to know if this event was a deviation from the equilibrium 
or not from one single worker outcome. Nonetheless, it is possible to provide stochastic guarantees, either from 
many computations of one worker, or one computation by multiple workers. Such guarantees may be enough for 
some scenarios. 

Specifically, if the master announces how many of each answer has received, workers may make punishment 
decisions based on the probability of such outcome. This is possible even for workers that did not compute the 
task, given that the cost of verification for workers is assumed to be negligible. Such decision will not be based 
on deterministic information, but we can provide guarantees on the probability of being the right decision. Hence, 
in this section we define a mechanism where the master sends to all the workers in each round the answers that it 
has received and how many of each. This is described in Algorithm [3] In summary, the approach is to carry out 
the computation as in a regular repeated game, but punishments are applied when it is known that some workers 
have deviated from the equilibrium of the constituent game with some parametric probability. The mechanism that 
is assigned to the workers is described in Algorithm 01 

We emphasize that what is punished by peer-workers is deviation from the agreed equilibrium (rather than 
cheating which is punished by the master). That is, if the equilibrium is to cheat with some probability p and the 
number of incorrect answers was x, punishment is applied when the probability of x incorrect answers is very low 
if all workers were using p, either if x is less or more than the number of cheaters expected. 

It is known ll^ that it is enough to apply peer-punishment for a number of rounds that neutralizes the benefit 
that the deviators might have obtained by deviating. Nonetheless, to avoid unnecessary clutter, we omit this detail 
in Algorithm 0] where punishment is applied forever. In our simulations in Section |V] we limit the punishment to 
one round, since that is enough to neutralize the benefit of a deviation for those parameters. 

As in the analysis of the previous section, to obtain a worse case analysis for the master, we assume that cheating 
workers return the same incorrect result. 

The following lemma characterizes the number of incorrect answers and the number of rounds that should trigger 
a punishment. 
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Algorithm 3: Mixed strategies master algorithm, py is set according to Lemma |5] 

1 while true do 

2 send a computational task to all the workers in W 

3 upon receiving all answers do 

4 with probability pv, verify the answers 

5 if the answers were not verified then accept the majority 

6 reward/penalize accordingly 

7 send to the workers a list of pairs (answer, count) 


Lemma 3: In a system with n workers where the mixed equilibrium of the constituent game is to cheat with 
some probability pc > 0, if the number of incorrect answers is at least (1 + 6)\npc~\ or at most (1 — S)[npc\ 
during r consecutive rounds, for any l/{npc) < S < 1, r > 1, and e > 0 such that rS'^ > 31n(l/e)/(npc), then 
there are one or more workers deviating from that equilibrium with probability at least 1 — e. 

Proof: Let 1, 2,..., n be some labeling identifying the workers. For any given round of computation, let Xi 
be a random variable indicating whether worker i cheated or not, and let X = i = 1, 2,..., n, 

the Xi random variables are not correlated. Thus, we can upper bound the tails of the probability distribution 
on the number of cheaters using the following Chernoff-Hoeffding bounds Il26ll . For 0 < i5 < 1, it is Pr{X > 
(1 + S)npc) < and for 0 < i5 < 1, it is Pr{X < (1 — d)npc) < Therefore, it is Pr{X > 

(1 + 5)\npc\) < e~'^'PcP'I'i Pr[X < (1 — (5)['npcJ) < Given that X cannot differ from the 

expected number of cheaters by less than one worker, we further restrict 5 as follows. For Xjinpc) < i5 < 1, it is 
Pr{X >(! + <)) \npc \) < and Pr{X < (1 — i5) [npcj ) < 

Letting Ehigh be the event of having X > (1 + b) \npc~\ incorrect answers for r consecutive rounds, and Eiow be 
the event of having A < (1 — S)[npc\ incorrect answers for r consecutive rounds, it is Pr{Ehigh) < j'i 

and it is Pr{Eiow) < q-'^'^pcS / 2 ^ Given that /^ < e for > 31n(l/e)/(npc)j if either 

Ehigh or Eiow occurs, there are one or more workers deviating from equilibrium with probability at least 1 — e 
and the claim follows. ■ 

In the following lemma, we show what is the minmax payoff when a mixed equilibrium of the constituent game 
is allowed. 

Lemma 4: For any p\>, such that p\> > 2WBa /+ WPcn) and py > WCp / WBa, and for any worker 
i S W, the minmax expected payoff of worker i is Vi = py WB_a — WCp, which is attained when all the other 
workers use pc = i- 

Proof: Let cr be a mixed strategy profile, that is, a mapping from workers to probability distributions over the 
pure strategies {C,C}, let at be the probability distribution {pCi, (1 — PCi)} of worker i in cr, and let a_i be the 
mixed strategy profile of all workers but i. We want to find worker i’s minmax payoff, which is the lowest payoff 
that the other workers can force upon i (cf. mi). That is, 


Vi = min max Ui (cr^, (j-i ). 







11 


Algorithm 4: Mixed strategies worker algorithm, pc is initialized according to Lemma|5] e > 0 is the probability 
of erroneous punishment (cf. Lemma [3]l. 

1 maxrounds t— [3npc ln(l/e)J // Punishment decisions only for S>ljnpc (cf. Lemma [3]) . 

2 counts ■<— empty queue of integers // COUnts[i] is the (i + l)th item, for i = 0,1,2,.... 

3 for each round = 1,2,... do 

4 upon receiving a task do 

// computation phase 


5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 
1 « 

17 

18 

19 

20 
21 
22 
23 


{ true, with probability pc 
false, with probability 1 — pc 
if cheat = false then result <— task result computed 
else result t— bogus result 
send result to the master 
// punishment phase 

upon receiving from the master a list of pairs (answer, count) do 
// update # of cheaters per round 
verify all answers 

#lncorrect number of incorrect answers 
enqueue #incorrect to counts 

if size of counts > maxrounds then dequeue from counts 

// punishment decision 

cheatersmin ■«- n 

cheatersmax -s- 0 

R - 1 — mm{maxrounds, round} 

for r = 1 to i? do 

if counts[7? - r] < cheaterSmin then cheaterSmin ^ counts[7? - r] 
if counts[7? - r] > cheaterSmax then cheaterSmax ^ counts[i? - r] 

S t- ^31n(l/e)/(rnpc) 

if 5 < 1 then 

if cheaterSmin > [(1 + ^)npc^ or cheaterSmax < [(1 - <5)npcJ then 
pc •<- 1 


// Lemma 0 
// Lemma m 


For any worker i, there are four possible utility outcomes, namely. 

= C, |F| > n/2) = (1 - pv) WBa - Pv WPc\F\ 

U^{s, = C, |F| > n/2) = PV WBa - WCp 
U,{s, = C, |F| < n/2) = -pvWPc\F\ 

U,{s, = C, |F| < n/2) = WBa - WCp. 

The expected utility of a worker i that deviates from equilibrium when all other workers use the same mixed 
strategy pc_i is the following. 
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U.=pc.{Pr U. {s. = C, |F| >'^)+Pr (^|F_.| < U. (s. = C, |F| < ^)) 

+ (1 - pc,){Pr (^|F_,| > U. (s. = C, |F| >'^)+Pr (|F_.| < [/. (s. = C, |F| < |)). 

It can be seen that Ui is linear with respect to pc^- That is, the function is either monotonically increasing, 
monotonically decreasing, or constant with respect to pc*, depending on the relation among the parameters, but it 
does not have critical points. Hence, for any given pc_i the maximum utility for worker i occurs when pc* is either 
0 or 1. We get then that the maximum Ui is either 


U^{pc. = l)=Pr U. (s. = C, \F\ >^)+Pr (^|F_.| < U. (s. = C, |F| < , 


2 

t/.(m = 0) = Pr (^|P_,| > P. (s. = C, \F\ >^)+Pr (^|P-.| < P. (s. = C, \F\ < 


Replacing and using that J2]j^o ("/)pLi(l - PC-iT ^ ^ = h and that jyj=o C'j^)Pc.M ~ PC-iT ^ = 

{n — l)pc_i we have the following. 

n—1 

U^{pc, = 1) = E 

j=(n-l)/2 

(n-3)/2 


n — 1 


Pe.S^-Pc-iT ^ ^ ■ {{^-Pv)WBA-pvWPc{j Fl)) 


E 

j=o 


n—l 


Pc-S^-PC-iT ^ T-pvWPcT+ P)) 


"“t / _1\ 

=-pvWPcil + {n-l)pc_i) + {l-pv)WBj^ ^ ^ ]pc_iT-PC-i) 


n-l-j 


(9) 


j = (n-l)/2 


And 


"“t / _1\ 

Uiipc. = 0 ) = r jp^_^(l -pc_J"-^-^(pv WBa - WCt) 


(«-l)/2 

E 

j=0 


j={n+l)/2 

n—l 

j 


Tc_iT-Pc-T~^-TWBA - WCr) 

(n-l)/2 


n—l 

j 


Pc-M-PC-i) 


n-l-j 


( 10 ) 


= pv WBa - WCt + (1 - Pv) WBa J2 

3=0 

To find the minmax payoff, we want to find the mixed strategy (i.e., pc-J that other workers may choose 
to minimize these utilities. Equation [10] is minimized when pc_i = 1, yielding Ui{pCi = 0,pc_i = 1) = 
Pv — ITCV > 0. The latter inequality is true because pv > WCt/ WBji^. On the other hand, given that 
Pv > 2 lTB^/(2 WBj, + WPcn), it is pv WPcn > 2(1 — pv) WBj,. Then, from Equation |9| we have 

n—l y 1 \ 


UiipCi = 1) < -PV WPcil - PC-.) - 2(1 - pv) WB^pc-i + (1 - pv) WBy^ ^ 

j={n-l)/2 

=-pvWPc{l - pc-i) + T - Pv)WBa ■ i-2pc-i + ^ ■^']pc-iT~Pc-i) 

\ 3 = {n-l)/2 k ■? / 


n — l 

j 


n-l-j 


( 11 ) 
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Given that (-2pc_i+Ei=(n-i )/2 j^)Pc ^ <0. EauationfTTIis negative for any vr . Therefore. 

Ui{pCi = 0iPc_i = 1) > Ui{pCi = liPc_i = p) for any p G [0,1], and the minmax expected payoff is Vi = 
U^{pc, = 0, PC., = l)=Pv WBa - WCt- ■ 

Given that the aim of the mechanism is to provide correctness, py must be lower bounded and pc upper bounded 
to enforce an equilibrium that provides correctness guarantees in each computation with parametric probability, 
which we do in the following lemma. 

Lemma 5: In any given round of computation, and for any p > 0 and 0 < ^ < 1, if each worker cheats with 
probability Pc < 1/(2(1+0) the master verifies with probability py > — 

Pc), the probability that the master obtains a wrong answer is at most p. 

Proof: Let 1,2,..., n be some labeling identifying the workers. Let Xi be a random variable indicating whether 
worker i cheated or not, and let X = X)r=i = 1,2,..., n, the Xi random variables are not correlated. 

Thus, we can upper bound the probability of having a majority of cheaters using the following Chernoff-Hoeffding 
bound 1261. For 0 < ^ < 1, it is Pr{X > (1 + ^)npc) < For pc < 1/(2(1 + ^)) we get 

Pr{X > {1+Onpc) < 

Pr{X > n/2) < e-<"/(6(i+G). 

Thus, for the master to achieve correctness with probability at least 1—p, it is enough to have (1—+ 
PvPc ^ ‘F’ which is true for py > if Pc < which holds for pc < 1/(2(1 + ^)). ■ 

The following theorem establishes the correctness guarantees of our mixed-strategies mechanism. 

Theorem 6: Consider a long-running multi-round computation system with set of workers such that n > 2. For 
any set of payoff parameters, and any 0 < pc < 1/(2(1 -b ^)) for some 0 < ^ < 1, setting the probability of 
verification so that py > 21FB^/(21LB^ -b WPcn), py > WCr/WBj{, and py > some 

p > 0, the following applies to each round of computation. If workers comply with the repeated games framework 
when punishment is stochastically consistent, the mechanism defined in Algorithms |3] and |4] guarantees that the 
master obtains the correct answer with probability at least 1 — p, the expected utility of the master is 



and the expected utility of each worker i is 


Ui = (pc(I -pv)p>h + (I -pc)pvp>h + (I -pc)p<h) WBa -pcpv(I + (n - l)pc)WPc - (I -pc)WCr- 
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Where 

p>H= E (”T^)yc(i-pcr-'-^ 

i=(n-l)/2 V J / 

/ — 1 \ 

P>h= E (” ■ and 

j={n+l)/2 V ■? / 

(ra-l)/2 / _ i\ 

= E ( • )pc(l 

i=o V J / 

Proof: First, we notice that, by making pc arbitrarily close to 0, the expected worker utility is arbitrarily close 
to WBj( — WCp-, which is bigger than the minmax payoff Vi = pv WBj( — WCj- for any pv < 1- Then, there 
exists an enforceable payoff profile w, that is, a payoff profile where Wi > Vi for each worker i and, hence, there 
exists a Nash equilibrium payoff profile that all workers will follow due to long-term rationality (cf. Proposition 
144.3 in 121]). The rest of the claim follows from Lemmas HID and|5] ■ 

V. Simulations 

A. Design 

In this section, we present our simulations of the mechanism in Algorithms [3 and 0] For the sake of contrast, 
we also carry out simulations for the mechanism in ifTSl . and for a repeated application of the one-shot mechanism 
in d. For easy reference, we denote our mechanism as RG (repeated games), the mechanism in HI as ED 
(evolutionary dynamics), the mechanism in ifTSl as OS (one shot), and the repeated application of OS in a multi¬ 
round computation environment as ROS (repeated one shot). 

These mechanisms differ as follows. RG includes the threat of a peer-punishment that stops workers from deviating 
from the agreed equilibrium, ED is aimed to converge to correctness after some time (rounds), and OS does not 
include previsions for equilibrium deviations. The common assumption among all three approaches is that workers 
comply with the mechanism laid out. In RG, workers are assumed to be rational in the long term, and consequently 
they will agree on a given Nash Equilibrium (i.e., a pc) that is computed taking into account the potential profit 
of future interactions. In ED, it is assumed that workers update their pc for the next computation round using a 
particular formula, which is a function of the pc and the payment received in the previous computation round. In 
OS, workers are assumed to be rational but short-sighted, that is, they do not take into account future interactions 
(short-term rationality). 

Once the parameters (payment, punishment, etc.) have been fixed (as we do for simulations), the performance of 
RG, ED, and ROS when all workers comply with the agreed protocol can be compared by a simple computation 
of expected utilities and probability of correctness. However, the tradeoffs between correctness and utilities when 
worker misbehavior occurs (in other words, what is the resilience of these systems to deviations) is an open question 
that we answer experimentally as follows. 

Eor the sake of fair comparison, we use for all three mechanisms the same parameters used in ifTSl , ifTSll whenever 
possible (summarized in Table HD in the Appendix). Specifically, we set WCj- = 0.1. Eor each n G {9,27,81} 
and WBj[ G {1,1.1,1.2,..., 2}, we set WPc = WBy\^/(npc) for RG. That is, the expected punishment in RG 
is WB^. Given that in ED and ROS the worker punishment is a constant value (i.e., it is not proportional to the 
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number of cheaters), to simulate ED and OS we set WPq = WBj^, that is, the expected punishment in RG. As 
indicated in d, these parameters are consistent with statistics obtained in SETI@home projects. 

Observing the conditions of Theorem|6] for RG we fix pc = 0.1 < 1/(2(1 +^)) for any ^ > 0, and we fix pv = 
0.17 which for the parameters chosen verifies that py > 2 WB^/{2 WB_a + WPcn) = 1/6, that py > WCp/ WBj( 
since WCp/ WBj^^ < 1/10, and that py > /(6(i+4)) _ (p)/(e“"^ — p^), which is true for some 

(p > 0 and ^ > 0 as required by Theorem |6] To implement the punishment decision in RG, we set the probability 
of punishment error to e = 0.01 

Eor ROS, we set py = ( WBa + 0.1)/(3 WBa) + 0.01 > ( WBa + WCr)/{ WPc + 2 WBa) as required by OS. 
Eor the master payoffs, the aim in d was to focus on the master cost making MCy = 20 but zeroing MPw and 
MB-ji to exclude the impact of whether the correct answer is obtained or not. Here, we consider such impact in all 
three mechanisms making MCy = MPyv = MB-ji = n WBa, and we set MCa = WBa assuming that the master 
cost of accepting an answer is just the payment that the worker receives (no overheads). 

It is fair to notice that in ED the master checks the answers received by computing the task itself {audit), which 
is usually more costly than just verifying a given solution (e.g., all NP problems that are not in P). Moreover, 
when the master audits, it obtains the correct result even if all workers cheat, which is not the case when verifying. 
Given that the master is penalized for obtaining a wrong answer, when verifying its utility is negatively affected 
with respect to auditing when no punishment is received by the master. Eor these simulations, we maintain the 
same value for the master cost of verification or auditing, but in our model we zero the master punishment when 
all workers cheat (cf. Table HIH in the Appendix). 

We assume that worker deviations occur in 0.5% of the computation rounds, and that these deviations continue 
until “fixed” (if possible). That is, we evaluate the performance of all three mechanisms for 200 rounds of 
computation introducing an identical initial perturbation in them. Specifically, we set [n/2j workers to start with 
Pc = 0 in ED and ROS, and pc as defined above for RG. Eor the remaining [n/2] workers, we evaluate the range 
Pc € {0.5, 0.6,..., 1} in all three systems. Notice that this number of deviators is minimal to have an impact on 
voting schemes. ED will make the deviators converge to pc = 0 by design. ROS does not include previsions for 
deviators so they will not return to the desired behavior. Eor RG, we assume that, after being punished by peers in 
one round, workers return to the agreed equilibrium in the following round. 

Under such conditions, we compute the number of rounds when the master obtains the correct answer, the master 
and worker utilities aggregated over all these rounds, and we measure the convergence time for ED and the time 
to detect the deviation for RG. We discuss the results of our simulations in the following section. 

B. Discussion 

The results of our simulations for n = 9 are shown in Eigure [T] Similar results for n = 27 and 81 are shown in 
Eigures |2] and |3 left to the Appendix for brevity. The results shown correspond to one execution of the simulator. 
Multiple executions were carried out obtaining similar results. 

It can be seen in Eigure IV-AI that the number of rounds when the master obtained the correct answer is similar 
for RG and ED, except when the deviator pc comes closer to 1 where the performance of ED worsens. In general, 
both mechanisms achieve significantly better correctness than ROS. 
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(a) Number of coirect rounds. 


(b) Cumulative master utility. 



(c) Cumulative follower worker utility. 


(d) Rounds to detection/convergence. 



Fig. 1. n = 9 


With respect to utilities, Figure IV-AI shows that RG is sensibly better than ED and ROS in master utility for 
most {WB^,pc) combinations. Yet, for follower-worker utility (Figure IV-Ab . RG is almost as good as ED, which 
is slightly better because a follower worker in ED never cheats. Both, RG and ED, are significantly better than 
ROS where the deviator pc becomes bigger. 

The intuition on why this performance is achieved by RG can be obtained from Figure [V^ where it can be seen 
that, for these specific scenarios, our mechanism detects deviations very fast, in contrast with the slow convergence 
of ED. It should be noticed that fast detection of deviations does not necessarily imply correctness, given that in 
RG the equilibrium is some pc > 0. Yet, RG achieves correctness similar or better than ED, and much better than 
ROS, even though in the latter two the compliant (stable for ED) worker behavior is pc = 0. 

VI. Conclusions 

Our simulations show that in presence of [n/2] deviators, which is minimal to have an impact on voting 
schemes, even though the follower workers use pc > 0 (in contrast with the other mechanisms where pc = 0 for 
the followers), even assuming that the cost of verifying is the same as the cost of auditing, and even under the 
risk of unfair peer-punishment (i.e. workers may be punished even if they do not deviate, because the punishment 
decision is stochastic), our mechanism performs similarly or better than HU, and both significantly better than a 
repeated application of ESI. 
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These experimental results, together with our theoretical analysis validating the application of the repeated games 
framework, demonstrate the benefit and the promise of applying repeated games to the master-worker paradigm. To 
the best of our knowledge, this is the first study of multi-round master-worker computing applying this framework. 

A future extension of this work would be to enable the mechanism to also cope with malicious workers, that is, 
workers that either intentionally or due to software or hardware errors, return an incorrect task result (recall the 
relative discussion in the related section of the Introduction). Following HU, we could use statistical information 
on the distribution of the different worker types (malicious and rational). Then, the deviation threshold of our 
mechanism will need to be dependent on the expected number of malicious workers, so to keep motivating the 
rational workers to be truthful; we expect that the analysis will need to be significantly revised. Another future 
extension would be, as in Qa, to consider the possibility of groups of workers colluding in an attempt to increase 
their utility. For example, one worker could compute the task and inform the others of the correct result so that 
all return this result to the master (and hence all would obtain the master’s payment); or workers would return the 
same incorrect result in an attempt to cheat the master in accepting an incorrect task result. The challenge here is 
for the master to cope with these collusions, without knowing which specific workers are colluding. 
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Appendix 

Table HI] summarizes the parameter values used in our simulations and Table |III| shows the master and workers’ 
utilities as derived under the specific parameter values. Figures |2] and |3 depict the simulation results for 27 and 81 
workers, respectively. As it can be observed, conclusions similar to those obtained for 9 workers can be derived 
here. 
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RG [this paper] 

ROS dl] 

ED [13 

n 

{9,27,81} 

{9,27,81} 

{9,27,81} 

WBa 

{1,1.1,1.2,... , 2} 

{1, 1.1,1.2,..., 2} 

{1, 1.1, 1.2, . . . , 2} 

WPc 

WBa!{ n-pc) if |F| < n, 

0 if |F| = n 

WBa 

WBa 

WCt 

0.1 

0.1 

0.1 

PC 

[n/2j:0.1, 

rn/2]:{0.5,0.6, . . . , 1} 

Ln/2J:0, 

rn/2]:{0.5,0.6, . . . , 1} 

[n/2j:0, 

rn/2]:{0.5, 0.6, . . . , 1} 

Pv 

0.17 

(WBa + 0.1)/(3W'S^) + 0.01 

initially: 0.5, 

min: 0.01 

MCa 

WBa 

WBa 

WBa 

MCv 

uWBa 

uWBa 

nWBj^ 

MPw’ 

uWBa 

uWBa 

tiWBa 

MBtz 

nWB A 

nWB A 

nWBj^ 

other 

e ^ 0.01 

- 

T ^ 0.5, 

an> ^ 0.1, 

Otm — Ct-u, — 0.01 
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RG [this paper] 

ROS [18] 

ED [13] 

(verified, Si — C) 

WBa - WCt 

17i (verified, Si — C) 

-WPc\F\ 

-WPc 

Ui (not verified, Si — C,\F\ > n/2) 

WBa 

t/i(not verified, Si — C, \F\ > n/2) 

-WCt 

Ui (not verified, Si — C,\F\ < n/2) 

0 

Ui(si = C, |F| < n/2) 

WBa - WCt 

17m( verified, \F\ < n) 

MBtz - MCv- 
(n - \F\)MCa + |F|= VUFc 

MB-r - MCv - (n - \F\)MCa + |F| WPc 

17M(verified, |F| — n) 

-MCv + n^WPc 

MB-r - MCv + nWPc 

17M(not verified, \F\ > n/2) 

-MPw - \F\MCa 

UM{not verified, \F\ < n/2) 

MB-r. - (n - \F\)MCa 
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Number of correct rounds. 


Cumulative master utility. 


RG 

ED 

ROS 



RG 

ED 



(a) 


(b) 


Cumulative follower worker utility. 


Rounds to detection/convergence. 



(C) 


(d) 


Fig. 2. n = 27 
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Number of correct rounds. 


Cumulative master utility. 



RG 


RG 




(C) 


(d) 


Fig. 3. n = 81 




















